Constructing a Speech Translation System using Simultaneous Interpretation Data

نویسندگان

  • Hiroaki Shimizu
  • Graham Neubig
  • Sakriani Sakti
  • Tomoki Toda
  • Satoshi Nakamura
چکیده

There has been a fair amount of work on automatic speech translation systems that translate in real-time, serving as a computerized version of a simultaneous interpreter. It has been noticed in the field of translation studies that simultaneous interpreters perform a number of tricks to make the content easier to understand in real-time, including dividing their translations into small chunks, or summarizing less important content. However, the majority of previous work has not specifically considered this fact, simply using translation data (made by translators) for learning of the machine translation system. In this paper, we examine the possibilities of additionally incorporating simultaneous interpretation data (made by simultaneous interpreters) in the learning process. First we collect simultaneous interpretation data from professional simultaneous interpreters of three levels, and perform an analysis of the data. Next, we incorporate the simultaneous interpretation data in the learning of the machine translation system. As a result, the translation style of the system becomes more similar to that of a highly experienced simultaneous interpreter. We also find that according to automatic evaluation metrics, our system achieves performance similar to that of a simultaneous interpreter that has 1 year of experience.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of pause length on listeners² impressions in simultaneous interpretation

We have been attempting to realize simultaneous machine interpretation. However, determining the interpreting utterance timing is as difficult as determining translation units. This remains a major concern for the development of such a speech translation system. It is also crucial for the system’s users that the speech generated by the system is clear and easy to listen to. In this paper, we fo...

متن کامل

Corpus analysis of simultaneous interpretation data for improving real time speech translation

Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-to-speech translation systems have been predominantly repurposing translation models that are typically trained for consecutive translation without a motivated attempt to model incrementality. Furthermore, the n...

متن کامل

Role of pausing in text-to-speech synthesis for simultaneous interpretation

The goal of simultaneous speech-to-speech (S2S) translation is to translate source language speech into target language with low latency. While conventional speech-to-speech (S2S) translation systems typically ignore the source language acousticprosodic information such as pausing, exploiting such information for simultaneous S2S translation can potentially aid in the chunking of source text in...

متن کامل

Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation

Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...

متن کامل

Rapid development of speech translation using consecutive interpretation

The development of a speech translation (ST) system is costly, largely because it is expensive to collect parallel data. A new language pair is typically only considered in the aftermath of an international crisis that incurs a major need of crosslingual communication. Urgency justifies the deployment of interpreters while data is being collected. In recent work, we have shown that audio record...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013